SAR ship target detection method based on CNN structure with wavelet and attention mechanism

Ship target detection in synthetic aperture radar (SAR) images is an important application field. Due to the existence of sea clutter, especially the SAR imaging in huge wave area, SAR images contain a lot of complex noise, which brings great challenges to the effective detection of ship targets in SAR images. Although the deep semantic segmentation network has been widely used in the detection of ship targets in recent years, the global information of the image cannot be fully utilized. To solve this problem, a new convolutional neural network (CNN) method based on wavelet and attention mechanism was proposed in this paper, called the WA-CNN algorithm. The new method uses the U-Net structure to construct the network, which not only effectively reduces the depth of the network structure, but also significantly improves the complexity of the network. The basic network of WA-CNN algorithm consists of encoder and decoder. Dual tree complex wavelet transform (DTCWT) is introduced into the pooling layer of the encoder to smooth the speckle noise in SAR images, which is beneficial to preserve the contour structure and detail information of the target in the feature image. The attention mechanism theory is added into the decoder to obtain the global information of the ship target. Two public SAR image datasets were used to verify the proposed method, and good experimental results were obtained. This shows that the method proposed in this article is effective and feasible.


Introduction
Ship target detection of remote sensing images is very important for marine supervision, including illegal smuggling, traffic management, oil spill detection, piracy and fishery management [1][2][3][4][5]. The images used for ship detection mainly include reflected infrared, thermal infrared, optical and radar images. The radar imaging is different from other three methods, it can continuously obtain data at all time and all weather, and it has been widely used in many fields [6][7][8][9]. In recent years, with the launch of SAR satellites, such as Sentinel-1 [10], Terra-SAR-X [11] and Gaofen-3, there are more and more high-resolution SAR images, which greatly improve the efficiency of ship detection in the process of ocean management [12][13][14][15]. but also have a large number of parameters. At the same time, for ship detection and segmentation of SAR image, they usually do not consider the interference and influence of SAR image speckle noise and other factors on ship detection effect. Of course, with the deepening of the network structure layer, especially for the deep structure network, the features in the intermediate feature graph become weaker and weaker. Therefore, through in-depth study of the existing ship target detection algorithms, we have found that reducing the network complexity, eliminating the influence of noise and making full use of the feature of the network middle layer can improve the network performance. Based on the above knowledge, this paper proposed a new CNN structure. It combines the theory of wavelet transform and attention mechanism, and realizes the effective detection of ship target in SAR image. Therefore, the new method is called WA-CNN ship target detection algorithm in this paper, namely WA-CNN method.
The new WA-CNN algorithm is an improvement based on the structure of U-Net network, so its basic structure is the same as that of U-Net network [34], which is composed of encoder and decoder. The WA-CNN algorithm considers the characteristics of SAR images and the advantages and disadvantages of U-Net network, so it not only can reduces the number of parameters and the layers of depth of U-Net network structure, but also can improves the efficiency of the network and the detection effect of ship targets in SAR image, which is suitable for SAR image processing. The main contributions and specific innovations or improvements of this paper include the following three aspects. (1) The quality of feature map can be effectively improved by adding wavelet transform. According to the characteristics of SAR image, the dual-tree complex wavelet transform is added in the coding stage of network structure to achieve the goal of removing speckle noise and improving the quality of feature map, so as to obtain better image maps. Speckle noise and directional sensitivity are the typical characteristics of SAR imaging, while the dual-tree complex wavelet transform has more directionality, which is suitable for SAR image processing and can extract more complete directional information. At the same time, through multi-scale decomposition, speckle noise is usually concentrated in high-frequency sub-images, and selecting low-frequency sub images can well filter out a lot of speckle noise. (2) Attention mechanism is introduced to improve the utilization of middle layer features and the extraction rate of global information. In the decomposition layer, some convolution networks not only extract local information, but also have low utilization of features. Therefore, the WA-CNN algorithm solves this problem by introducing attention mechanism to improve the effective utilization of features. The attention mechanism layer is created in the decoder, which improves the utilization of the middle layer, and makes full use of the global information in the feature calculation process to make the acquired information more accurate. (3) It designs wavelet decomposition pooling layer in the network structure, which not only greatly reduces the number of depth layers and parameters of U-Net network, but also improves the efficiency of the network.
The rest of this paper was organized as follows. Section 2 introduced the principle and implementation process of the WA-CNN algorithm. Section 3 explained the experimental data, experimental design and performance evaluation indicators. The experimental results were analyzed and discussed in detail in Section 4. And the summary of this paper and the prospect of future research work were introduced in Section 5.

A. Construction of WA-CNN network structure
The structure of the WA-CNN network is shown in Fig 1. Its basic structure is similar to a U-Net network structure, with an encoder and decoder. The structure of the encoder part is composed of three stages, and each stage is connected by a pooling layer. Because the dual-tree complex wavelet transform (DTCWT) has good performance in direction selection, redundancy and reconstruction, the pooling layer in the encoding process uses DTCWT to perform the pooling operation, i.e. the wavelet pooling. Its purpose is to reduce the influence of speckle noise of SAR images, and to preserve more structural information, such as edges, endpoints and corners. The structure of the decoder part is also composed of three stages, each stage is connected by an up-sampling layer, and its function is to enlarge the output feature map to twice the input. An attention mechanism is constructed in the decoding process, which can better perform feature extraction and fusion, and which is beneficial to extracting global information.
To improve the computing performance of the network, the original input image is cropped into 256×256. In the first stage of the encoder, the convolution layer performs three convolution operations. The size of the convolution kernel is 3×3 and there are 64 convolution kernels. Therefore, the depth of the feature map is 64. In the second stage of the encoder, firstly, the input feature map is processed by wavelet pooling, and the wavelet pooling layer obtains the feature map whose size is half of the input feature map and depth is 64. Next, the result of the wavelet pooling layer is subjected to three convolution operations. The size of the convolution kernel is still 3×3 and there are 128 convolution kernels, so the convolution layer can obtain a feature map with a depth of 128. In the third stage of the encoder, through the processing of the wavelet pooling layer, a feature map with a size of half of the input feature map and a depth of 128 can be obtained. The size of the convolution kernel is still 3×3, and its number is 256, so a feature map with a depth of 256 can be obtained. The output feature map of the third stage of the encoder is used as the input image of the first stage of the decoder. The decoder is symmetrical to the encoder. However, the feature map obtained by the up sampling layer will be fused with the feature map of the encoder in order to increase the number of channels of the feature map, and an attention mechanism is also introduced during feature fusion. Through the processing and calculation of attention mechanism, the output feature map can contain the global information. In the last stage of the decoder, a feature map with a depth of 64 performs a convolution operation with two convolution kernels and their size is 1×1. In addition, the convolutional layer obtains a predicted feature map with two channels, which represents the ship target and background, respectively. The implementation of wavelet pooling and attention mechanism in the network structure will be introduced in detail in next sub-sections B and C.

B. Principle of wavelet pooling
The convolutional layer in CNN extracts the most basic visual features, such as endpoints and corners, and these features form high-level abstract features in the subsequent layers. It is very important to preserve these features for segmentation effect. However, the inherent speckle noise in SAR image affects the feature extraction, further affects the detection and segmentation of ship targets. The simple max pooling processing in CNN network often loses some detail features. To reduce the influence of speckle noise and to save more detailed feature information, wavelet pooling layer is introduced to replace the traditional max pooling layer in CNN network. In the wavelet pooling layer, DTCWT is chosen for pooling operation, because it has the following advantages, such as approximate invariance, efficient sequential computation, limited redundancy, perfect reconstruction and good directional selectivity. The feature image generated by convolution layer can be transformed by DTCWT to generate two low-frequency coefficient sub-images LL 1 and LL 2 , and six high-frequency coefficient sub-images in different directions HL 1 , HL 2 , LH 1 , LH 2 , HH 1 and HH 2 , corresponding to the direction of ±15˚, ±45˚and ±75˚, respectively. The average value of the sum of the two low-frequency coefficient sub-images is used as the output of the wavelet pooling layer.
By introducing DTCWT into the deep convolutional network structure, the structural feature information of the input layer is preserved by its low-frequency coefficient sub-images through specific rules, and the speckle noise in the SAR image is suppressed by its high-frequency coefficient sub-images. Similar to the max pooling layer, the input of the wavelet pooling layer is the output of the convolutional layer, as shown in Fig 2. In the wavelet pooling layer, after each input feature map is transformed by DTCWT theory, eight sub-image feature maps can be obtained, as shown in Eq (1).
Where x i denotes the input feature image. Then low-frequency sub-images LL 1 and LL 2 are averaged to obtain their average value LL average , which is used as the output of wavelet pooling layer. The definition is given as follows.
In fact, in the wavelet pooling layer shown in Fig 2, the low-frequency coefficient subimages generated by DTCWT can reduce the influence of SAR image speckle noise, so as to maintain good structural features, and it is very beneficial for target segmentation in SAR image.

C. Capturing features based on attention mechanism
In traditional CNN, deep level feature extraction is realized by convolution operation between the input image and convolution kernel. However, because the convolution kernel is relatively small, compared to the original image or the target area, part of the information cannot be effectively extracted, which will affect the final detection or segmentation effect of the target area. With the number of convolutional layers adding, this defect becomes more and more serious. For the detailed structure of the ship target in the SAR image, it is very important to solve this problem. This problem can be overcome by adding an attention mechanism to a suitable location in the network.
In the WA-CNN algorithm, the process of using the attention mechanism to improve the accuracy of ship target detection in SAR images mainly includes three steps, namely similar feature extraction, feature similarity calculation and original features enhancement. The specific principle block diagram is shown in Fig 4. Step 1: Similar feature extraction. Firstly, the features in the original feature map are divided into three spaces A, B and C. The distribution of spatial features of A and B is similar to that of C space. Therefore, the features of C space can be enhanced by the features of A and B spaces. The features of the encoding part and the decoding part in the whole network generate A and B spaces respectively. The features of encoder and decoder are fused in series on the channel, and then the concatenation features are obtained and used to generate the features in C space. Three spaces are generated by convolution layer. In Fig 4, the feature maps in the encoding and decoding process are represented by E and D. The number of channels is reduced by convolution operation, and the channel is reduced to 1/4 of the original feature, fE; Dg 2 R W�H�CH . Here W, H and CH represent the width, height and channel number of the feature map, respectively. To calculate feature similarity, three-dimensional features need to be reduced to two-dimensional. Then the three-dimensional features with reduced channels are reshaped into two-dimensional feature matrices A and B.
Step 2: Feature similarity calculation. The similarity matrix between A and B is realized by dot product operation.
The similarity matrix is activated by the SoftMax function, and the attention feature map F 2 R ðW�HÞ�ðW�HÞ is obtained.
Where F ij represents the correlation between the ith feature in matrix A and the jth feature in matrix B. The attention feature map is a coefficient matrix with a value between 0-1, which is used to enhance the C space and reflects the similarity between any two points in the matrix A and B. Step 3: Original features enhancement. The encoding feature and the decoding feature are stacked on the channel to form a concatenated feature, and its dimension becomes W×H×2CH. Concatenated features reduce the number of channels through the convolutional layer to obtain a new feature map I, I 2 R W�H�CH . To enhance the original feature, feature map I is reshaped into matrix C. The matrix C and the two-dimensional attention map are multiplied to obtain an enhanced feature map. The enhanced feature map is reshaped from twodimensional to a three-dimensional feature map through the convolutional layer, and its dimension is R W�H�CH . It can be regarded as a coefficient matrix used to enhance the original features, and its mathematical model is as follows.
Where Q j is the jth feature generated by the attention mechanism, and ε is a coefficient used to measure the proportion of attention features in all features. Finally, the attention feature map and the original feature map are added to obtain the final enhanced feature map.

A. Data introduction
The experimental data used in this experiment comes from two public datasets, namely, SSDD [35] and SAR-SHIP-SET [36]. There are 1160 images and 2356 ships in the SSDD dataset. The size of the images in the dataset is approximately 500×500. The data is mainly acquired by

PLOS ONE
satellites such as RadarSat-2, TerraSAR-X and Sentinel-1, including four polarization modes of HH, HV, VV and VH, with a spatial resolution of 1m-15m. There are ship targets in large areas of sea and coastal areas. To improve the training speed of the network, the original image is cropped into a small image with the size of 256×256. For comparing the experimental results of different methods under the same conditions, the allocation of the training set and test set is not specified. Because the number of SSDD datasets is relatively large, half of the training data and half of the test data are allocated to verify the experiments.
The SAR-SHIP-SET dataset contains 210 scenes of SAR ship images. There are 102 scene images acquired by Chinese Gaofen-3 satellite and other 108 scenes images are from sentinel-1 satellite. To facilitate calculation and processing, the original SAR image is cropped into an image of size of 256×256, which contains 43819 ship slice images in total. The entire data set is randomly divided into the training set and test set, and their proportions are 70% and 30%, respectively.

B. Training parameter setting
The WA-CNN network uses batch iterative methods to complete the network training. During the training process, the batch size is set to 20, and the number of iterations is 1000. The learning rate is set and updated according to the number of iterations. The number of iterations is less than 200 as the first stage, and the learning rate is set to 0.001. When the number of iterations is greater than or equal to 200 but less than 800, it is the second stage, and the learning rate is 0.0005. The third stage is when the number of iterations is greater than or equal to 800, and the learning rate is 0.0001.

C. Evaluation index
For ship detection in SAR images, there are only two types of final classification results for each pixel, namely ship pixel and non-ship pixel, and they are also called the positive class and negative class. When the detection result is compared with the true value, there will be two kinds of correct classification and wrong classification, which are recorded as true and false, respectively. If a pixel of ship is detected as a ship, this pixel is the true positive class (TP); if it is detected as a non-ship class, then this pixel is called a false negative class (FN). If a non-ship pixel is correctly detected as a non-ship class, then this pixel is called a true negative class (TN); if it is incorrectly detected as a ship class, it is called a false positive class (FP). If the values of the true positive class and the true negative class are larger, the accuracy of the correct classification of the ship target is higher.
The evaluation indexes of target detection performance include sensitivity (SE), specificity (SP), accuracy (ACC) and area under curve (AUC) of the receiver operating characteristic curve (ROC). In addition to AUC, the mathematical definitions of other three evaluation parameters are as follows.
The parameter SE denotes the segmentation performance of ship target pixels, SP denotes the segmentation performance of non-ship target pixels, and ACC represents the segmentation performance of the entire image pixel. AUC is the area under the ROC curve and the maximum value of the area is one. The larger the value of AUC is, the better the segmentation effect of the network is. It should be noted that the evaluation indexes SE, SP and AUC are evaluated from the perspective of machine learning classification, that is, from the accuracy of deep learning model prediction, while ACC parameter is evaluated from the field of image pixel classification (image segmentation). Although they have different levels, the purpose of evaluation is the same.

Analysis and discussion of experimental results
To verify the feasibility and effectiveness of the WA-CNN algorithm, verification experiments were carried out with SAR images from SSDD and SAR-SHIP-SET databases, and comparative experiments were also done with other networks such as FCN, U-Net and DeepLabv3+. The evaluation indexes mentioned above are used to analyze the experimental results obtained by different methods, and the performance of each algorithm is discussed.

A. Experiments with SSDD data set
The It can be seen in Fig 5(B) that when the FCN method is used to detect Fig 5(A), the boundary of the ship is missing. In Fig 5(B), there are some omissions in ship target detection. In Fig  5(E) and (5F), there is a strong speckle noise which is mistakenly detected as a ship target. In Fig 5(H), the boundaries of the detection results of adjacent ships are not obvious. In Fig 5(J), the detected ship target is missing and incomplete.
The result of U-Net method is shown in Fig 5(C). The experimental results obtained by U-Net are very similar to those obtained by the FCN method, which is not ideal. For example, in Fig 5(A), the detection effect is not bad, and the boundary detection of the ship is relatively complete. There is a missed detection phenomenon in Fig 5(B), the waves generated by the ship's traveling are mistakenly detected as ships in Fig 5(C), and some islands are detected as ship targets in Fig 5(D). For the SAR images shown in Fig 5(E) and 5(F), the speckle noise is relatively strong, resulting in false detection, and the noise is mistakenly detected as a ship. In Fig 5(H The image shown in Fig 5(D) is the detection result of DeepLabv3+. In general, the edges of ships are relatively complete, and they are less affected by speckle noise. The detected ship targets are well distinguished from ports, coasts and islands, and there is no missing phenomenon in the detection of ships under high resolution. The only disadvantage is that the small ship target with weak scattering in Fig 5(B) is missed.
From the visual effect, the experimental results obtained by the WA-CNN algorithm are very good, which is shown in Fig 5(E). It can not only recognize ship targets in SAR images of different scenes accurately, but also have complete edge structure. Whether it is a low resolution or high resolution SAR image, the ships can be effectively detected, without missing detection or false detection, and it can overcome the influence of speckle noise, wave, port, island and adjacent ships.
The analysis of the above experimental results shows that for the SAR image ship detection of using the SSDD data set, the best detection effect is WA-CNN algorithm, next DeepLabv3+, finally FCN and U-Net. To further compare and analyze the performance of these four methods, four evaluation parameters of SE, SP, ACC and AUC are selected for specific description, and the experimental results are shown in Fig 6. Here, the abscissa represents different evaluation parameter indexes, and the ordinate is the value of each index. It is worth noting that the parameter values of each index in Fig 6  As can be seen in Fig 6, for the four evaluation indexes SE, SP, ACC and AUC, the calculated value of the result image obtained by WA-CNN algorithm is significantly higher than that of other three methods. This shows that WA-CNN algorithm can effectively and completely detect ship targets in SAR images with SSDD dataset, and the effect of parameter evaluation is also the best. Then the algorithm with better performance is DeepLabv3+, and finally FCN and U-Net. It is known in Fig 6 that

B. Experiments with the SAR-SHIP-SET data set
To verify the processing effect of the WA-CNN algorithm on other SAR image data, the data of SAR images in the SAR-SHIP-SET data set is used to carry out relevant experiments. Because the types of ship targets in the SAR images collected by this dataset are more abundant, and the imaging scene is more complex. The experimental results and experimental images were shown in Fig 7. Fig 7(  In Fig 7, the detection result of the FCN algorithm is shown in Fig 7(B). For Fig 7(A), the FCN algorithm mistakenly detects some ports as ships, and in Fig 7(B), it detects islands as ship targets. In Fig 7(C), two small ship targets are missed. For Fig 7(E) and 7(F), strong noise and waves are detected inside the ship. In Fig 7(H) and 7(J), the edge extraction of ships is not ideal. Not only the edges are missed, but also the edges of adjacent ships are very difficult to distinguish. It can be seen that the FCN algorithm has a poor detection effect in the SAR image ship target with SAR-SHIP-SET dataset. The images shown in Fig 7(C) are obtained by U-Net algorithm. Obviously, the detection effect of U-Net algorithm and FCN algorithm is very similar. It not only mistakenly detects targets such as ports, islands, noise and waves as ships, but also often fails to detect small ships or the edges of detection targets are not clear. Therefore, for U-Net and FCN algorithms, their detection effect is relatively poor in SAR images. Fig 7(D) shows the detection result of DeepLabv3+ algorithm. It can be seen that the effect of DeepLabv3+ is generally better than that of U-Net and FCN, but for the processing of  Fig 7(E). It can be seen that WA-CNN algorithm can detect all kinds of ships in Fig 7(A)-7(J), the effect is also good and is very close to the ground truth.
The above content mainly discusses the detection effect of different algorithms from subjective vision. Next, some parameter evaluation indexes such as SE, SP, ACC and AUC will be utilized to evaluate the performance of different algorithms on SAR-SHIP-SET data. The results are shown in Fig 8, and the coordinate meaning of it is the same as that of Fig 6. The abscissa  That is to say, for the SAR-SHIP-SET data set, the WA-CNN algorithm can also obtain good ship target detection results, and its performance is also better than that of FCN, U-Net and DeepLabv3+ algorithms.

C. Analysis of network parameters
The complexity of the network is determined by the number of network parameters. The larger the parameter numbers are, the more complex the network is. Network parameters are affected by two factors. One is the depth of the network; another is the structure of the network. The parameters of FCN, U-Net, DeepLabv3+ and WA-CNN are compared and analyzed, and their results are shown in Fig 9. It is very obvious in Fig 9 that the parameters of WA-CNN algorithm are much less than those of other three algorithms.
In terms of network depth, the design of the deep network follows the principle that the length and width of the feature map become half of the original after each down sampling, and the depth of the feature map is doubled. The more down sampling, the deeper the depth of the network and feature maps, which means more convolution kernels and parameters. In the FCN, U-Net, and DeepLabv3+ algorithms, because the depth of the network and feature maps are very deep, the feature maps in the middle layer are simply superimposed, so the utilization of these feature maps with very deep depth is very low. In the WA-CNN algorithm, the deepest depth of the network is only 256. Compared with the previous three algorithms, the depth and the amount of parameters are much reduced, and the network detection performance is enhanced through wavelet pooling and attention mechanism.
In the structure of the WA-CNN algorithm, the network structure consists of an encoder and a decoder. The wavelet pooling layer is embedded in the encoder, and the attention mechanism is introduced in the decoder. The pooling layer only accounts for 5%-8% of the total parameters of the entire network, so the changes in the parameters are mainly determined by the attention mechanism.
The following analyzes the effect of the attention mechanism on the parameters. The attention mechanism replaces a normal convolution. If the position of the attention mechanism is normal convolution, the parameter value is 2CH IN ×3×3×CH OUT , (CH IN = CH OUT ). The attention mechanism consists of three convolution processes. The parameter number of the first convolution process is 2CH IN ×3×3×CH OUT , and the parameter number of other two convolution processes is CH IN ×3×3×CH IN /4. It can be clearly found that the parameter amount of the attention mechanism is increased by 25% compared with the normal convolution, but compared with the entire network, the increase in the parameter amount of the attention mechanism in the decoder is negligible.

D. Impact of network hyper parameters
In Eq (5), the contribution ratio of attention mechanism in the whole network is represented by different values of super parameter ε Therefore, changing the contribution ratio of attention mechanism is equivalent to finding the appropriate ε value, which can make the detection effect of the whole network reach the best. Next the data of SAR-SHIP-SET is taken as an example to illustrate. In the process of experiment, different detection results can be obtained by setting different ε values, that is, embedding appropriate attention mechanism. Since the parameters can be added to the network in the process of forward propagation and backward propagation, it means that the parameters are a learnable variable and can be corrected every iteration. Therefore, the value of the hyper parameters with the best performance can be obtained automatically by iterative training network. Table 1 Table 1 is 0.7562, which means that when ε is equal to 0. The SE value is calculated for the result image after processing all the images in Fig 7, and then the average value is taken. It can be seen that when the value of super parameter ε is 0, it indicates that there is no attention mechanism in the network. At this time, the values of SE, SP, ACC and AUC are the minimum. With the increasing degree of attention mechanism, that is, the value of hyper parameter increases, and the value of each evaluation parameter is increasing. It can be seen that when the attention mechanism is added to the network, the detection effect of the network is obviously different. When the value of the super parameter is 0.2, the performance of the network is not improved much. When the value is 0.6, the performance of the network is improved more, and then the attention mechanism is increased, and the performance of the system is improved slowly. For example, when ε = 0.6 and ε = 1, the values of each evaluation parameter are almost the same, indicating that the network performance has not changed much. This shows that attention mechanism has a great impact on the performance of the network, and the contribution ratio of attention cannot be too small or too large. The contribution ratio of attention should not be too small; otherwise the feature extracted by attention mechanism in convolution layer will be weakened. Of course, its value does not need to be too large, because when it reaches a certain level, the performance hardly changes. Fig 10 shows the change curve when two super parameters ε 1 and ε 2 are automatically learned and adjusted with the number of network iterations. ε 1 and ε 2 represent the super parameters of the first attention mechanism and the second attention mechanism in the network decoder, respectively. At the beginning of the iteration, the initial value is set to one. When the number of iterations reaches 200, the super parameter value is basically stable. In fact, when the number of iterations reaches 250, the network detection performance is the best, and the values of ε 1 and ε 2 are about 0.75 and 0.25, respectively. This experiment shows that the attention mechanism is effective and the value of super parameter can be determined. At the same time, the super parameter value of the first attention mechanism is larger than that of the second attention mechanism. This is because the first attention mechanism is far from the output of the network and needs to pass through more convolution layers. In order to keep the attention feature from being diluted, it needs a larger super parameter value to make the contribution of attention feature more. On the contrary, the second attention mechanism is close to the output of the network, and only a small attention mechanism is needed to extract better features.

Conclusions
Due to the interference of various factors during the SAR imaging process, it is difficult to effectively detect and segment ship targets in SAR images. To improve the efficiency of ship target detection, this paper constructed a new WA-CNN method. The basic framework of the WA-CNN network is the U-Net network, but its depth has been significantly reduced. At the same time, the wavelet pooling layer is introduced in the encoder, and the dual-tree complex wavelet transform is used to suppress the speckle noise of SAR image, so that the features can be better maintained than general network structure, which is beneficial to the subsequent effective feature extraction. The attention mechanism layer designed in the decoder is conducive to obtaining global information, making the information more complete, and improving the utilization of the middle layer. The parameter amount of the entire network is very small, only 0.95M. Compared with other similar methods, the complexity is significantly reduced.
Many test experiments were carried out through real SAR image datasets, and good experimental results were got, indicating that the WA-CNN proposed in this paper is feasible. In the next step, we will further improve the performance of the network structure and complete the detection of multiple different types of targets.